Using Google Street View for virtual observations of neighborhoods and dwelling units: A feasibility study

In face-to-face household surveys, field interviewers are sometimes asked to make notes of characteristics of the dwelling unit on the sampled address as well as its surroundings before making contact with a household member living at the sample address. Field interviewer observations of this kind are used to improve efficiency of field data collection and to be used as nonresponse adjustment. However, field interviewer observations can be expensive and the quality of observations needs to be improved. Recently, survey organizations start to utilize Google Street View to conduct virtual observations of the dwelling unit and the neighborhood. This paper reports a feasibility study that evaluates the feasibility of using virtual observations, assesses its agreement with field interviewer observation results, and examine whether virtual observations correlate with survey response status and survey estimates. We found moderate to high agreements between virtual and interviewer observation results. We also found that some observation results are significantly related to response status and survey estimates. However, virtual observations using GSV have coverage issues, which could limit their potential use.


Background
In the era of declining survey response rates, survey researchers utilize data generated in the survey production process to facilitate data collection and to reduce survey errors [1].In faceto-face household surveys, this usually includes information reported by address listers, observations conducted by field interviewers about neighborhoods and/or the dwelling units, and contact history data recorded in computer databases.Generated in the operational process and available for both respondents and nonrespondents [2], these auxiliary data can be used to monitor fieldwork progress [3], prioritize cases in the field [4], inform adaptive survey designs [5], and construct nonresponse adjustment weights [6].
Of interest to this paper are field interviewer observations of the sampled addresses and the neighborhood in which sampled addresses reside [6].To conduct observations of this type, interviewers are instructed to walk around the sampled address, observe the characteristics of the sampled address, check the surroundings of the sampled address, and record their observations in an electronic system (e.g., via a coding form) prior to making any contact with any household member living at the sampled address.Commonly collected observation measures include unit type, access barriers, conditions of the unit, presence of children, and indicators of safety concerns [7][8][9][10].
Neighborhood and housing unit characteristics are important to survey research in general and crime and victimization surveys in particular.Research has shown that people living in urban cities [11], neighborhoods of higher crime rates [11], neighborhoods with lower income [12], neighborhoods with renters and/or apartment buildings [12] tend to have a lower propensity to respond to survey requests.In addition, these characteristics are also related to crime rates and victimization experiences [13].As a result, neighborhood and housing unit characteristics are ideal candidates for adaptive designs and for nonresponse adjustment and are important information to collect as they are related to both response rates and survey estimates of crime and victimization [14].
Although interviewer observations are handy to collect, studies evaluating the quality of such observations found that they are potentially prone to measurement error ( [10], see [2] for an exception), interviewer variance [6,10], as well as varied missing data rates [9,10,15].Evidence is mixed on the predictive power of interviewer observations on response propensity [10].One study showed that interviewer observations of housing units have more predictive power of response status than area level information [16].Two other studies only found limited ability of these observations to predict response status [9,17,18].Another study assessing interviewer observations in the National Health Interview Survey found that although interviewers did record the observations, they did not necessarily conduct the observations prior to first personal contact as instructed [8].
With the development of remote sensing technologies used in geo-spatial data collection, Google Street View (GSV) has become a popular resource to virtually inspect locations searched via Google Map.First released in the United States in 2007, GSV provides interactive panoramic views of streets and environment to the public for free.GSV images are traditionally collected by cars equipped with special cameras to capture photos and the photos are then aligned and stitched together to offer a 360-degree view of an area.More recently, mobile phone users could also contribute to GSV by uploading live view pictures via the Street View app, which allows for more frequent updates and wider coverage of locations.
Past decades have seen an increasing number of studies using GSV in various disciplines, as noted in [19].For example, GSV images have been used in urban planning studies to assess the abundance of street greenery and identify land-use information [20,21].In the public health literature, several studies have measured neighborhood walkability and facility access with virtual observations conducted with GSV and have examined such measures as correlates of health outcomes [22][23][24].With GSV, virtual observations can be conducted remotely without going into the field, saving time and cost [19,25].Recently, advanced computing algorithms have been applied to GSV images [26].However, past studies have noted potential pitfalls with the use of GSV for research purposes, including issues with coverage and low-resolution images [27,28].
To carry out virtual observations, most studies so far used street segment or block face as units of observation and collected measures of street conditions and neighborhood disorder [21].For example, Wu and colleagues developed a residential environment assessment tool that utilized GSV images to identify litter on the street level and broken windows on the property level in both urban and rural areas [29].Odgers and colleagues evaluated street safety, decay, and disorder via GSV observations in the UK to assess children environment risk [30].
Studies checking agreement between GSV and field observations found higher concordance for items related to objective measures and lower for subjective items [31].In addition, small and intermittent items (such as litter) were found to be difficult to identify via virtual observations [21,27].
Two studies have evaluated virtual observations through GSV against field interviewer observations in household surveys.Vercruyssen and Loosveldt showed that virtual observations and interviewer observations were similar in predicting nonresponse [28].Ren and colleagues evaluated both types of observations on completeness, validity, variability and/or reliability, and predictive power [10].They found that interviewer observations have less missing data and are more predictive of response status than virtual observations [10].But both types of observations need improvements on validity and reliability [10].
The current study extends the research on virtual observations through the use of GSV and attempts to address the following four research questions: To our knowledge, this study is the first one to quantify the availability and quality of GSV images and the first to examine the predictive power of virtual observations on key survey outcomes.

Methods
This study used addresses sampled for the 2020 National Crime Victimization Survey Redesign Field Test (NCVS)-R.NCVS collects data on personal and household victimization with a nationally representative sample of residential addresses.Interviews were conducted using a Computer-Assisted Personal Interviewing (CAPI), with a small number being administered by phone from the interviewer's home.Data collection on October 28, 2019, and continued through March 31, 2020.We obtained informed consent by having field interviewers reviewing the consent with each sampled person and asked for their verbal agreement.Field interviewers were allowed to start the survey only after obtaining the verbal agreement.There was no other documentation of verbal consent, and no witness was required to be present during the verbal consent process.We obtained parental permission to interview anyone under 18 before approaching the youth for an interview.Youth were also administered an assent, similar to that used for the adult consent, to get assent.Interviews were conducted on youth respondents after obtaining both parental consent and youth assent.NCVS-R was reviewed and approved by Westat Institutional Review Board.
For this study, addresses of 1,341 sampled dwelling units were randomly selected from the NCVS-R sample, consisting of a mix of screener respondents and nonrespondents, urban and rural addresses, as well as various self-reported crime and victimization status.

Field interviewer observations
Field interviewers were instructed to conduct on-site observations of household and neighborhood conditions of sample addresses before contacting the sample address.Interviewer observations cover dwelling unit (DU) type, neighborhood income level, any signs of children, and street and neighborhood conditions.Observations are displayed in Table 1.

Virtual observations via GSV
Two virtual observers were asked to conduct virtual observations with GSV images.To retrieve street view images, virtual observers were instructed to first search for the sample address using Google Maps (maps.google.com)and browse the street view images available for the area.Virtual observers were instructed to confirm whether they were viewing the correct dwelling unit by checking the housing number visible on the housing structure.For condos and apartment buildings where the exact unit could not be directly observed, virtual observers were advised to observe building structure as a proxy for the dwelling unit.To evaluate neighborhood conditions, virtual observers were directed to browse street view images for adjacent blocks around the address.When GSV images were unavailable for a dwelling unit, virtual observers were asked to use Google Earth View to identify the land type and dwelling unit type and to indicate other observational items as missing.
Virtual observers conducted the same observations as field interviewers, as shown in Table 1.In addition, virtual observers were instructed to provide additional information on the quality of street view images, such as whether the images display any inconsistencies.
Two virtual observers each conducted virtual observations for half of the addresses and double coded a random selection of 50 addresses.An average of Cohen's kappa coefficients is 0.61 based on the 50 double-coded addresses.The two virtual observers met with the research team and resolved all inconsistencies and applied the resolutions to the rest of addresses.After all addresses were coded, the team met and resolved all inconsistencies.

Availability and quality of GSV images for virtual observations
We first examined the availability and quality of GSV images of selected sample addresses.As shown in Fig 1, out of 1,340 sample addresses, about 19% do not have any street view images available.Among the 1,092 addresses with street view images, virtual observers could verify that 71% of these images display the correct dwelling units.This is done via confirming the housing number on the housing structure or in the area (e.g., visible on the sidewalk, printed on the postal mailbox in front of the unit) or by inferring the housing number by checking its adjacent housing units.
Among addresses that virtual observers were able to confirm or infer that the images are for the correct addresses, there are 31 addresses with street view images that are not suitable for observations.These images are partially missing or blurry (see an example image in Fig 2 ), too far to observe in detail (an example image is shown in Fig 2 ), or contain inconsistencies (e.g., having multiple images stitched together that affected virtual observation of the unit).After excluding the cases with aforementioned reasons, our analysis dataset contains 743 addresses with virtual observations conducted via good quality Google Street View images.In other words, virtual observations can only be conducted on 55% of sample addresses.

Agreement between virtual observations and field interviewer observations
To answer Research Question 2, we assessed the agreement between the two sets of observations by computing the proportion of addresses with the same observations from virtual observers and field interviewers (p a ).In addition, we calculated Cohen's kappa as: where p a is the agreement rate and p e is the agreement rate expected by chance.
As implied by the formula, Cohen's kappa takes into consideration of chance agreement whereas the agreement rate (p a ) does not.
Table 2 presents agreement rates and kappas based on 661 addresses that have both virtual observations and field interviewer observations data."Dwelling unit type" exhibited the highest agreement rate among virtual observers and field interviewers; they agreed on 94% of the addresses with a Cohen's kappa value of 0.87, consistent with earlier findings [10].The percent of addresses coded as a one unit or single family house is 68% by virtual observers and 67% by field interviewers.Notes: 1 The percentages are the percent of addresses coded as one unit or single family house, having signs that the dwelling unit was not well kept, having any security measures, having signs that the neighborhood was not well kept, having children at home, located in a low income neighborhood, and located in a safe neighborhood. 2Virtual observers and field interviewers can select more than one security measure on the observations form.A dummy coded measure (having at least one security measure vs. not having any security measure) is used for this table. 3This observation item has five response categories as shown in Table 1.For this table, we collapsed the two safe options and the two unsafe options and used three categories for agreement and kappa calculation. https://doi.org/10.1371/journal.pone.0307272.t002 Although other observational items only show fair to slight agreement according to Cohen's kappa values [32], the agreement rates are all above 70%.The low kappas are likely the results of skewed distributions and/or the presence of empty cells [33,34].But it also points to the need to improve the observational items and improve the training of both the field interviewers and virtual observers.
Overall, field interviewers observed more addresses to have signs that the dwelling unit was not well kept, signs that the neighborhood was not well kept, and to be located in a low income neighborhood than virtual observers.Virtual observers judged more addresses to have at least one type of security measures with the aid of GSV images than field interviewers.

Validating virtual observations against survey data
To address Research Question 3 on accuracy of virtual observations, we took advantage of two self-reported measures available in the survey data-the number of children in the household and household income-and use them to assess accuracy of virtual observations.We first compared virtual observations of presence of children at a dwelling unit to selfreported number of children on 436 addresses that have both pieces of information available.The overall agreement rate between self-reports and virtual observations is 79% and kappa is 0.10.For comparison purpose, the agreement rate between self-reports and field interviewer observations is 78% and kappa is only 0.01.To gain a deeper understanding of the accuracy of virtual observations of presence of children at dwelling units, we further divided addresses into those self-reported having children (n = 57) and those self-reported not having children (n = 379).As shown in Table 3, among addresses that self-reported having children, 23% of the addresses were observed to have signs of children through virtual observations whereas field interviewers observed signs of children in 14% of them.By contrast, among addresses reported not having children, virtual observers and field interviewers correctly flagged 87% of them as having no signs of children.It seems that virtual observations through GSV are more accurate than interviewer observations in identifying households with children.
Next, we used self-reported household income to evaluate virtual observers' and field interviewers' observation of neighborhood income level.Specifically, if the self-reported household income is less than $50,000, we code that the household lives in a low-income neighborhood.If the self-reported household income is $50,000 or more, then the household lives in a middleor high-income area.We acknowledge that household income does not completely capture neighborhood income level even though the two are expected to be correlated.
The overall agreement rate between self-reports and virtual observations on neighborhood income is 61% with a kappa of 0.22 whereas the overall agreement rate between self-reports and field interviewer observations is 68% with a kappa of 0.36.Not surprisingly, the overall agreement rate between self-reported household income and observed neighborhood income is lower than that between self-reported presence of children in a household and observed presence of children at a dwelling unit.Table 4 shows the comparisons of virtual and interviewer observations against determination through self-reported household income.Both virtual observers and field interviewers are better at characterizing neighborhood income level for addresses located in middle-or highincome neighborhoods.For addresses with a self-report household income equal to or higher than $50,000, virtual observers using GSV correctly judged 92% of them as located in a middle or high-income neighborhood and field interviewers correctly judged 86% of them as located in a middle or high-income area.Again, virtual observations are closer to respondents' selfreport than interviewer observations.However, the reverse is true for low-income neighborhoods.About one third of addresses with a self-reported household income less than $50,000 were correctly coded as located in a low-income neighborhood via virtual observations.By contrast, field interviewers correctly coded 51% of them in a low-income neighborhood.

Predicting screener response status with virtual observations
To assess the association between virtual observations and the address' screener response status, we fit a logistic regression model to predict the likelihood of sample addresses responding to the screener survey using only virtual observations.As shown in Table 5, virtual observations of neighborhood income level and neighborhood safety are significantly correlated with sample addresses' response status-households observed as located in a low income neighborhood and households observed as located in a fairly or definitely safe neighborhood are more likely to be screener respondents, consistent with literature [11,12].Virtual observations of dwelling unit type and signs of neighborhood not well kept are marginally predictive of households' screener response status.The pseudo-R square of the model is 0.04.For comparison purpose, we fit the same logistic regression model using field interviewer observations to predict sample addresses' likelihood to participate in the screener interview.The pseudo-R square is also 0.04 but field interviewer observations of dwelling unit type, presence of any security measures, neighborhood income level, and neighborhood safety level are significant predictors of sample addresses' screener response status.

Predicting household level crime incidence with GSV observations
We also examined the associations between virtual observations and self-reported victimization experience at the household level.Three logistic regression models are fit with one each predicting whether the household had reported any personal crime, any property crime, or any type of crime (including either personal or property crime).The model estimates of virtual observations (in odds ratio), together with model estimates of field interviewer observations, are displayed in Table 6.
None of the virtual observations is significantly related to reporting of personal crime but field interviewer observations of signs neighborhood not well kept significantly increased the likelihood that the sample address would report personal crime victimization.
Two virtual observations (dwelling unit type and neighborhood income level) are marginally significant predictors of property crime whereas field interviewer observations of presence of children is marginally related to reporting of property crime victimization.

Discussion
This study evaluated the use of Google Street View to conduct virtual observations of the sampled addresses in a household crime survey.We first examined the availability and quality of GSV images and found that 19% of the addresses did not have GSV images.Among those with GSV images, 71% of the addresses could be confirmed by coders that the images showed correct dwelling units.Another 4% of addresses confirmed as showing correct dwelling units are not suitable for virtual observations.As a result, virtual observations can only be conducted on 55% of sample address.In other words, close to half of the addresses were not covered if researchers only relied on virtual observations via GSV.These findings are consistent with literature on incomplete coverage of GSV [22].We further checked the recency of the images and found that among the dwelling units that had street view images available, about half of the images were taken in the three years prior to virtual observations (2017,2018,2019).There were also 4% addresses with images captured 10 years ago, an empirical evidence for the time lag issues cited in [10,22].We advise that researchers interested in using virtual observations to note the time lag issues and to evaluate the impact of the time lag on the feasibility of using virtual observations and on the accuracy of the virtual observation results.
The agreement between interviewer and virtual observations is fair to high with all agreement rates higher than 70%.This is consistent with previous literature comparing virtual and on-site observations (e.g., [10]).For observations where agreement is high (e.g., type of dwelling unit, signs dwelling unit not well kept) or where virtual observations are more accurate than interviewer observations as compared with survey self-reports (such as presence of children), virtual observations conducted via GSV images can be an attractive alternative to replace interviewer observations as a source of auxiliary data.Replacing field interviewer observations with virtual observations could potentially save cost and effort that normally field interviewers would have to invest in order to collect observations in the field.In addition, with virtual observations, virtual observers can extend the area that they browse to a few blocks around the sample address to get a better observation of the neighborhood.This would save operational cost compared when field interviewers are asked to travel a few blocks around the sample address.
Further work is needed to develop the observational items and training virtual observers, especially on items that are subjective in nature.While agreement rates were relatively high, the reliability of the measures were quite low.The discrepancy between these is due to the difficulty of coding characteristics that are relatively uncommon (e.g., dangerous neighborhood).Virtual observers generally agreed when the characteristic was clearly present, but were not as consistent when it was not straightforward.This points to both improving procedures for the virtual observers and the field interviewers.
We examined the predictive power of virtual observations using GSV images in predicting response status and crime incidence aggregated at the household level.We found that virtual observations have comparable explanatory power as field interviewer observations in explaining variation in likelihood to respond to the screener interview and to report crime victimization.Furthermore, virtual observations of dwelling unit type are marginally significantly related with sample address's likelihood to respond to the screener interview and significantly related with sample address's likelihood to report any crime victimization, making it an ideal candidate for nonresponse adjustment.
Another advantage of virtual observations through GSV images is that virtual observations are conducted in a controlled environment, and thus, are less prone to environmental or contextual factors that would otherwise affect interviewers' observations in the field.Furthermore, monitoring of virtual observers conducting observations through GSV images is easier to implement than monitoring of field interviewers conducting the work on site.The performance of virtual observers will be improved with close monitoring by in-house supervisors.Recent developments in GSV methodology [26] have the potential to assist and improve performance of virtual observers.
A major limitation of the study is that the findings of this study are based on a small sample of addresses in a crime study, which restricted the power to detect small differences and the ability to extend our results to other surveys.We suggest that future research replicate our methodology and evaluate other types of virtual observations.

Fig 2 .
Fig 2. Examples of street view images not suitable for virtual observation.(A) Image is partially missing.(B) Housing unit is too far to observe in detail.https://doi.org/10.1371/journal.pone.0307272.g002 the availability and quality of GSV images for conducting virtual observations?